CS 4789/5789: Introduction to Reinforcement Learning
Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an unknown,
uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data.
Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and
has been used to design intelligent agents that achieve super-human level performances on
challenging tasks such as Go, computer games, and robotics manipulation.
This course focuses on basics of Reinforcement Learning. The four main parts of the course are
(1) basics of Markov Decision Process, (2) Planning and Control in MDP, (3) Learning in MDPs, and (4) imitation learning.
After taking this course, students will be able to understand classic RL algorithms and their analysis.
All lectures will be math heavy. We will go through algorithms and their analysis.
|
Staff
Instructors: Wen Sun (Cornell)
TAs: Wen-Ding Li, Hadi Alzayer
Lecture time: Tuesday/Thursday 9:40am - 10:55am ET
Instructor office hours: Tuesday and Thursday after class (10:55am-11:30am)
TAs office hours: Wen-Ding: Friday 3-4pm, Hadi: Wednesday 2:30-3:30pm
Contact: cornellcs4789@gmail.com
Please communicate to the instructors and TA only through this account.
Emails not sent to this list, with regards to the course,
will not be responded to in a timely manner.
|
Recorded Lectures
We are gradually releasing recorded lectures here
|
Prerequisites
Since lectures are math heavy and we will focus on algorithm design and analysis, we require students to have strong Machine Learning background (e.g., CS 4780). Students should be comfortable about basics of probability and linear algebra.
Since HWs contain programming problems, we expect students are comfortable about programming. We will use Python as the programming language in ALL HWs.
|
Grading Policies
Assignments 70% (HW0: 10%, HW1-HW3: 20% each); Final 30%; Attendance bonus 5%
All homeworks contain both math and programming (we use Python and OpenAI Gym).
Final exam only contains math
Homework Rules:
Homework must be done individually: each student must understand, write, and hand in their own submission. Solutions need to be typed (we encourage you to use Latex).
It is acceptable for students to discuss problems with each other; it is not acceptable for students to look at another students written answers.
You must also indicate on each homework with whom you collaborated with and what online resources you used.
Late days: Homeworks must be submitted by the posted due date.
You are allowed up to 6 total LATE DAYs for the homeworks throughout the entire semester. These will be automatically deducted if your assignment is late.
For example, any day in which an assignment is late by up to 24 hours,
then one late day will be used (up to two late days). After your late days are used up,
late penalties will be applied: any assignment turned in late will incur a reduction in score by 33% for each late day,
so if an assignment is up to 24 hours late, it incurs a penalty of 33%.
Else if it is up to 48 hours late, it incurs a penalty of 66%.
And any longer, it will receive no credit. We will track all your late days and any deductions will be applied in computing the final grades.
If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.
|
Diversity and Inclusiveness
While many academic disciplines have historically been dominated by one cross section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can pursue,
regardless of their socio-economic background, race, gender, etc.
We encourage students to both be mindful of these issues, and,
in good faith, try to take steps to fix them. You are the next generation here.
You should expect and demand to be treated by your classmates and the course staff with respect.
You belong here, and we are here to help you learn and enjoy this course.
If any incident occurs that challenges this commitment to a supportive and inclusive environment,
please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the
Computer Science Department Values of Inclusion.
|
Honor Code
Collaborations only where explicitly allowed
Do not use of forums like Course Hero, Chegg;
Whatever materials you use for your HWs, properly cite the references; If you are unclear about whether some online material can be used, ask instructors and TAs first
No sharing of your solutions within or outside class at any time
We will be extremely serious about academic integrity. The above is not an exhaustive list, and in general any Cornell and common sense rules about academic integrity apply. If it is not something we explicitly allowed, ask us whether it is OK before you do it.
Cornell University Code of Academic Integrity,
CS Department Code of Academic Integrity.
|
Course Notes
The course will sometimes use working draft of
the book "Reinforcement Learning Theory and
Algorithms", available
here.
Note that this is an extremely advanced RL theory book. A lot of the materials in the book are out of the scope of this class.
Thus we will pick very specific sections for you to read.
If you find typos or errors, please let us
know. We would appreciate it!
You can also self-study the classic book "Reinforcement Learning: An Introduction", available here
|
Schedule (tentative)
|
|
Lecture |
Reading |
Slides/HW |
02/9/21 |
|
Fundamentals: Markov Decision Processes |
AJKS: 1.1.1, 1.1.2 |
Slides, Annotated Slides |
02/11/21 |
|
Fundamentals: Markov Decision Processes (Continue) |
AJKS: 1.1.1, 1.1.2 |
Slides, Annotated Slides |
02/16/21 |
|
Fundamentals: Policy Evaluation |
AJKS: 1.1.1, 1.1.2 |
Slides, Annotated Slides |
02/18/21 |
|
Fundamentals: Value Iteration |
AJKS: 1.4.1 |
Slides, Annotated Slides |
02/23/21 |
|
Fundamentals: Policy Iteration |
AJKS: 1.4.2 |
Slides, Annotated Slides |
02/25/21 |
|
Control: Linear Quadratic Regulator (LQRs) |
AJKS: 13.1 |
Slides, Annotated Slides |
03/2/21 |
|
Control: Optimal Control in LQRs |
AJKS: 13.2 |
Slides, Annotated Slides, HW0 Due |
03/4/21 |
|
Control: Control for Nonlinear systems (Iterative LQR) |
Note on iLQR |
Slides, Annotated Slides |
03/9/21 |
|
No Class |
|
|
03/11/21 |
|
Learning: Model-baesd RL w/ Generative Model |
Note on Simulation Lemma |
Slides, Annotated Slides |
03/16/21 |
|
Learning: Supervised Learning & Approximate Policy Iteration |
AJKS: 4.1 |
Slides, Annotated Slides |
03/18/21 |
|
Learning: Approximate Policy Iteration & Performance Difference Lemma |
Note on PDL |
Slides, Annotated Slides, HW 1 Due |
03/23/21 |
|
Learning: Conservative Policy Iteration |
AJKS: 12.1 (up to proof of Theorem 12.2) |
Slides, Annotated Slides |
03/25/21 |
|
Learning: (Stochastic) Gradient Descent & Policy Gradient |
AJKS: 9.1 |
Slides, Annotated Slides |
03/30/21 |
|
Learning: PG Continue |
|
Slides, Annotated Slides |
04/01/21 |
|
Learning: Trust Region and Natural PG |
AJKS: 12.2 |
Slides, Annotated Slides |
04/6/21 |
|
Learning : NPG Continue and Review |
|
Slides, Annotated Slides |
04/8/21 |
|
Imitation Learning: Behavior Cloning |
|
Slides, Annotated Slides |
04/13/21 |
|
Imitation Learning: Interactive Learning w/ DAgger |
|
Slides |
04/15/21 |
|
Imitation Learning: DAgger (continue) |
|
Slides, Annotated Slides, HW 2 due (April 16th) |
04/20/21 |
|
Imitation Learning: Maximum Entropy Inverse RL |
|
Slides, Annotated Slides |
04/22/21 |
|
Imitation Learning: MaxEnt-IRL (Continued) |
Note on MaxEnt RL and Soft VI |
Slides, Annotated Slides |
04/27/21 |
|
Exploration: Exploration in RL and Multi-armed Bandits |
Note on MAB |
Slides, Annotated Slides |
04/29/21 |
|
Exploration: Multi-armed Bandits (Continue) |
Note on UCB |
Slides, Annotated Slides |
05/4/21 |
|
Exploration: Contextual Bandit |
|
Slides, Annotated Slides |
05/6/21 |
|
Case Study: AlphaGo |
AlphaGo Paper link |
Slides, Annotated Slides |
05/11/21 |
|
Review |
|
Slides, Annotated Slides, HW 3 Due |
05/13/21 |
|
No class |
|
|
Final Week |
|
Final Exam (May 21st 1:30pm ET) |
|
|
|
|